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In this paper, we explore the possibilities and limitations of recovering sparse signals in an online 
fashion. Employing a mean field approximation to the Bayes recursion formula yields an online 
signal recovery algorithm that can be performed with a computational cost that is linearly propor¬ 
tional to the signal length per update. Analysis of the resulting algorithm indicates that the online 
algorithm asymptotically saturates the optimal performance limit achieved by the offline method in 
the presence of Gaussian measurement noise, while differences in the allowable computational costs 
may result in fundamental gaps of the achievable performance in the absence of noise. 


Continuous innovations in measurement technologies 
have enabled the collection of high-dimensional data of 
various objects. This trend has created new research 
areas such as bioinformatics that apply knowledge and 
techniques from information science to the natural sci¬ 
ences in order to efficiently extract information from 
data. The importance of techniques for efficient infor¬ 
mation extraction has been growing more than ever in 
various fields of science and engineering 

Compressed sensing (CS), which is a framework for 
signal processing that is currently under development, 
is a successful resource that has produced many such 
techniques Si- In general, CS aims to realize high- 
performance signal processing by exploiting the prior 
knowledge of objective signals, particularly their spar¬ 
sity. That is, CS utilizes the fact that real-world signals 
can typically be represented by a small combination of el¬ 
emental components. In a standard scenario, CS utilizes 
this property to enable the recovery of various signals 
from much fewer samples of linear measurements than 
required by the Nyquist-Shannon theorem [iEi- 


Besides the standard scenario, the concept of CS is 
now spreading in various directions. For instance, in a 
remote sensing situation where a sensor transmits data 
to a data center placed at a far distance, the amount of 
data to be sent through a narrowband communication 
channel may be the biggest hindrance to efficient signal 
processing. As a practical solution to such a difficulty, I- 
bit CS is a proposed scheme for recovering sparse signals 
by usin g only the sign information of the measurement 
results 15l4l7j|. Another variation can be utilized when 
multiple signals are observed in a distributed manner. 
In such cases, the signal recovery performance can also 


be enhanced by exploiting information on correlations 
among the signals. This is referred to as distributed CS 


ia-2di. 


In this letter, we explore the possibilities and limita¬ 
tions of another variation of CS, which we call online CS. 
In this scheme, to minimize the computation and mem¬ 
ory costs as much as possible, measured data are used for 
signal recovery only once and discarded after that. This 
approach towards information processing is a promising 
technique for when devices of low computational capabil¬ 
ity are used for signal recovery; such situations can arise 
in sensor networks, multi-agent systems, micro-devices, 
and so on. It is also advantageous when the signal source 
is time-variant. 


Historically, online information processing was actively 
investigated by the physics community more than two 
decades ago in the context of learning by neural networks 
However, the utility of sparsity was not fully rec¬ 
ognized at that time, so the potential capability of online 
CS remains an open question. To clarify this issue, we 
focused on the performance when the Bayesian inference 
is considered in an online manner, which is guaranteed to 
yield the optimal performance when the signal recovery 
is carried out in an offline (batch) manner. 

Problem setup. As a general scenario, we consider a 
situation where an A^-dimensional signal = (x°) e 
is sequentially measured by taking the inner product 

y0,t 

= for a random measurement vector = 

(<!>■)€ . Here, we assumed that each component of 

x^ is independently generated from an identical sparse 
distribution 4>{x) = {l-p)6{x)+pf{x) and that each com¬ 
ponent of independently follows a distribution of zero 
mean and variance where 0 < p < 1, z = 1,2,..., iV 
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and f{x) is a density function that does not have a finite 
mass at the origin. The index t = 1,2,... counts the num¬ 
ber of measurements. For each measurement, the output 
?/*, which may be a continuous or discrete variable, is 
sampled from a conditional distribution Here, 

our goal is to accurately recover a;° based on the knowl¬ 
edge of D* = and the 

functional forms of (j){x) and P{y\u) while minimizing 
the necessary computational cost. 

Bayesian signal recovery and online algorithm. Let 
x{D*) denote the estimate of a:° by an arbitrary re¬ 
covery scheme. The standard measure of the accu¬ 
racy of x{D*) is the mean square error (MSE) mse = 

where generally indi¬ 

cates the average operation with respect to X. Fortu¬ 
nately, under the current setup, Bayes’ theorem, which 
is given by 


P{x\D*) 




( 1 ) 


guarantees that mse is minimized by the minimum mean 
square error estimator = J dxxP{x\D*). 

However, evaluating exactly is, unfortunately, 

computationally difficult. 

To practically resolve this difficulty, we introduce the 
following two devices: 


• Online update: We rewrite © in the form of a 
recursion formula: 


P{x\D*^^) 


p(yM\u*+^)p(x\D*) 

J dxP{y*^'^^\u*^'^^)P{x\D*) 


( 2 ) 


• Mean field approximation: To make the nec¬ 
essary computation tractable, we approximate 
P{x\D^) by a factorized distribution of the expo¬ 
nential family [2^ as 


Pix\D*)-U\ 


N ! p—a\x^l2+h\xi 


(j){Xi) 


ZialK) 


( 3 ) 


while utilizing the 
rameters 


set of natural 
where Z(a-,/i*) 


pa- 


Introducing online computation to the Bayesian infer¬ 
ence based on conversion from o to @ has also been 
proposed in earlier studies on the learning of neural net¬ 
works 13, llj. On the other hand, the parameterization 
of ([3|) for the approximate tractable distribution may not 
have been popular for the online learning of neural net¬ 
works. 

When the prior distribution of x is smooth, which 
is typically the case in neural network models, the 
posterior distribution is expected to asymptotically ap¬ 
proach a Gaussian distribution. This means that, at 
least in the asymptotic region of a = tfN » 1, the 
posterior distribution can be closely approximated as 
P{x\D*) oc exp (-(a; - - m‘)/2) by em¬ 

ploying the mean m* = f dxxP{x\D^) and covariance 
C* = f dx{xx'^)P{x\D^) - mm} as parameters, where 
T denotes the matrix transpose. Supposing this prop¬ 
erty, earlier studies derived update rules directly for m* 
and C*. However, in the current case, the strong sin¬ 
gularity of the prior (jj^x), which originates from the 
component of <5(a;), prevents P{x\D*) from converging 
to a Gaussian distribution, even for a » 1. To over¬ 
come this inconvenience, we derived update rules for 
{(a-,/i-)} based on the expression of ([3]) and computed 
the means and variances as to* = (9/9/i*) \n Z{a*, h*) and 
vl = (57(5/i‘)^)ln^(a*,h*), respectively. 

Let {(a-,h-)} be given; therefore, {(u|,to-)} is also 
provided. The update rule of (a*, h*) ^ is de¬ 

rived by inserting the expression of Q to the right-hand 
side of ([2]) and integrating the resultant expression with 
respect to x except for Xi . In the integration, we approx¬ 
imate as a Gaussian random variable 

whose mean and variance are and 

X\t^ = respectively. This is 

supported by the assumption for the distribution of the 
measurement vectors $*. By employing this Gaussian 
approximation to evaluate the integral and expanding the 


resultant expression up to the second order in we 

can obtain the online signal recovery algorithm as follows: 


= hi + In+ V^z) - mli^rr f VzP{y*-^\A^^^ + V^z), 


where A* = E^i x‘ = and Vz = 

dzexp(-^^/2)/^/^ represents the Gaussian measure. 
Note that the necessary cost of computation for perform¬ 


ing 0 is 0{N) per update. This means that the total 
computational cost for the recovery when using t = aN 
measurements is 0{N^), which is comparable to the cost 
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per update of existing fast offline signal recovery algo¬ 
rithms 

Macroscopic analysis. Because and j/* are random 
variables, O constitutes a pair of stochastic difference 


equations. However, because ~ the differ¬ 

ence with each update becomes infinitesimally small as 
N grows. This property makes it possible to reduce (H]) 
to a set of ordinary differential equations 


dm r 


■In J VsP{y\^/qv+ ^/Q-qu), 


dm „ r I d 

V j \ d{mvl^) 


VnP(y\ — 


{d^vY 

^ In J VsP(y\^v + \/Q-qu)^ 
d 




(5) 


v + \ Qo - u 




In J VuP{y\y/qv +\/Q-qu)^, 


in the limit oi N,t oo but keeping a = tjN finite, 
where Qo = f dxcj){x)x'^, Ti'j, denotes the integration or 
summation with respect to y, and q, m, and Q are evalu¬ 
ated asq = f dx°(j)(x°)Vz {x)'^, m = f dx^(/)(x^)'Dzx° {x), 
and Q = q + f dx^(j){x'^)Vzd{x) ld{\/qz) using (x) = 
{dld(s/qz)) s/qz + mx°). 

Two issues are of note here. First, replacing 
{dQlda,dqlda,dmlda) with (Q/a, g/a, m/a) in ([5]) 
yields the exact equation of state for the Bayesian of¬ 
fline signal recovery, which is derived by the replica or 
cavity method 13|, ll7| . This implies that the differences 
in the macroscopic descriptions—i.e., the use of differen¬ 
tial instead of algebraic equations—characterize the fun¬ 
damental limitations on the achievable performance of 
the online method (jl]) compared to the offline method. 
Second, similar to the Bayes optimal case for the offline 
recovery, the equation of state ([5]) allows a solution with 
Q = q = rh, Q = Qo, and q = m. Focusing on the solution 
of this type simplifies ([5]) to 

^ = Tr y' y' VuP{y\^v + \/Qo-qu) 

X I g In J VuP{y\y/qv + VQo - gu) j , (6) 


where g = / dx^(j){x°)Vz[dld{\/qz)lnZ{q,\/qz-\- qx°))'^. 
Because the numerical computation indicate that this 
solution is the unique attractor of ([S]), we examined the 
performance of the online algorithm by utilizing (p]). 

Examples. We tested the developed methodologies 
on two representative scenarios of CS. The first is 
the standard CS, which is characterized by P{y\u) = 
(27rcr^)“^^^exp(-(y-M)^/(2cr^)). The other is the 1- 
bit CS, which is modeled by P{y\u) = f VzQ {yu + anz). 
Here, y 6 {+1,-1}, and 0(x) = 1 for x > 0 and 0 oth¬ 
erwise. For practical relevance, we considered situations 
where each measurement was degraded by Gaussian noise 
of zero mean and variance tr^ for both cases. However, 
by setting = 0, we can also evaluate the performance 
of a noiseless setup. For the generative model of sparse 


signals, we considered the case of the Bernolli-Gaussian 
prior (j){x) = (1 - p)S{x) + p(27r(T^)“^/^ exp (-x^/(2tT^)), 
which means that Qo = p<J^. 

Fig. [T] compares mse from the experimental results 
obtained with (U) and the theoretical predictions. The 
experimental results represented averages over 1000 sam¬ 
ples, while the theoretical predictions were evaluated by 
solving (0 with the use of the Runge-Kutta method. 
With the exception of noiseless standard CS, where nu¬ 
merical accuracy becomes an issue because of the ex¬ 
tremely small values of mse, the experimental data ex¬ 
trapolated to N ^ oo exhibited excellent agreement with 
the theoretical predictions. Note that the data of finite 
N were biased monotonically to be higher for smaller N 
and larger a. 

For noiseless standard CS, the offline reconstruction 
achieves mse = 0 when a is greater than a certain critical 
ratio 0 < a^p) < 1 [13 ■ On the other hand, the anal¬ 
ysis based on ([Bl) indicated that mse ~ 0(exp(-a/p)) 
holds for large a, which means that perfect recovery is 
unfortunately impossible with ([3]). However, this result 
may still promote the use of the online algorithm for very 
sparse signals with 0 < p « 1 where exp(-a/p) becomes 
negligible. 

For noiseless 1-bit CS, the result of El meant that 
mse ^ (Qo/2) was asymptotically achieved by the 

offline method, where K = 0.3603_ On the other hand, 

dH]) yielded the asymptotic form mse ^ 2Qo{^) for 
a » 1. This indicates that online recovery can save com¬ 
putation and memory costs considerably while sacrificing 
mse by only a factor 4 asymptotically. 

These results may imply that there are fundamental 
gaps in the asymptotically achievable performance limit 
depending on the allowable computational costs in the 
absence of noise. However, this is not the case when 
Gaussian measurement noise is present, for which P(y\u) 
becomes differentiable with respect to u. This property 
guarantees that q oi la asymptotically holds for both the 
online and offline methods, which yields a universal ex- 
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FIG. 1: Comparison between mse from experimental results and theoretical predictions for p = 0.1. The crosses 
correspond to TV = 200,500,1000,2000, and 4000 (in descending order), and the white circles are the extrapolations 
of the data to TV ->• oo by quadratic fitting. The curves represent the theoretical performances of the online 
(continuous) and batch (dashed) reconstructions. The disagreement between the experimental and theoretical 
results in the noiseless standard CS case was due to the limited numerical accuracy of the computational 
environment used in this study. Also in this case, batch reconstruction achieves mse = 0 for a larger than Qfc(p) < 1. 


pression for the asymptotic MSE: mse 2p/g = 2pl{Ia), 
where 

I = TrfvvP(y\^ov){^^^lnP(y\^ov)^ (7) 

represents the Fisher information of the measurement 
model P{y\u) averaged over the generation of This 
impresses the potential utility of the online algorithm and 
indicates that a performance similar to that of the offline 
method can be asymptotically achieved with a signihcant 
reduction in computational costs. 

Summary and discussion. We developed an online al¬ 
gorithm to perform Bayesian inference on the signal re¬ 
covery problem of CS. The algorithm can be carried out 
with 0{N) computational and memory costs per update, 
which are considerably less than those of the offline al¬ 
gorithm. From the algorithm, we also derived ordinary 
differential equations with respect to macroscopic vari¬ 
ables that were utilized for the performance analysis. Our 
analysis indicated that the online algorithm can asymp¬ 
totically achieve the same MSE as the offline algorithm 


with a significant reduction of computational costs in the 
presence of Gaussian measurement noise, while there may 
exist certain fundamental gaps in the achievable perfor¬ 
mance depending on usable computational resources in 
the absence of noise. Numerical experiments based on 
the standard and 1-bit scenarios supported our analysis. 

Here, we assumed that correct knowledge about the 
prior distribution of signals and the measurement model 
is provided in order to evaluate the potential ability of 
the online algorithm. However, such information is not 
necessarily available in practical situations. Incorporat¬ 
ing the idea of online inference into situations lacking 
correct prior knowledge is an important and challenging 
future task. 
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